Serveur d'exploration sur la TEI

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The Challenge of Parallel Text Processing

Identifieur interne : 000317 ( Main/Exploration ); précédent : 000316; suivant : 000318

The Challenge of Parallel Text Processing

Auteurs : Milena Slavcheva [Allemagne, Bulgarie]

Source :

RBID : ISTEX:DCFA54D43AAAE82C454D2EBE8F20C9BE1204FD31

Abstract

Abstract: The paper presents the technology of building a large German-French parallel corpus consisting of of.cial documents of the European Union and Switzerland, and private and public organisations in France and Germany. The texts are morphosyntactically annotated, aligned at the sentence level and marked up in conformance with the TEI guidelines for standardised representation. The multi-level alignment method is applied; its precision is improved due to the correlation with the constraints of the classical alignment method of Gale and Church. The alignment information is encoded externally to the parallel text documents. The process of creating the corpus is an interesting algorithm of applying a number of software tools and adjusting intermediate production results.

Url:
DOI: 10.1007/3-540-45323-7_23


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct:series">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">The Challenge of Parallel Text Processing</title>
<author>
<name sortKey="Slavcheva, Milena" sort="Slavcheva, Milena" uniqKey="Slavcheva M" first="Milena" last="Slavcheva">Milena Slavcheva</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:DCFA54D43AAAE82C454D2EBE8F20C9BE1204FD31</idno>
<date when="2000" year="2000">2000</date>
<idno type="doi">10.1007/3-540-45323-7_23</idno>
<idno type="url">https://api.istex.fr/document/DCFA54D43AAAE82C454D2EBE8F20C9BE1204FD31/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000115</idno>
<idno type="wicri:Area/Istex/Curation">000115</idno>
<idno type="wicri:Area/Istex/Checkpoint">000265</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000265</idno>
<idno type="wicri:doubleKey">0302-9743:2000:Slavcheva M:the:challenge:of</idno>
<idno type="wicri:Area/Main/Merge">000343</idno>
<idno type="wicri:Area/Main/Curation">000317</idno>
<idno type="wicri:Area/Main/Exploration">000317</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">The Challenge of Parallel Text Processing</title>
<author>
<name sortKey="Slavcheva, Milena" sort="Slavcheva, Milena" uniqKey="Slavcheva M" first="Milena" last="Slavcheva">Milena Slavcheva</name>
<affiliation wicri:level="3">
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Institute for German Language R5, 6-13, D-68161, Mannheim</wicri:regionArea>
<placeName>
<region type="land" nuts="1">Bade-Wurtemberg</region>
<region type="district" nuts="2">District de Karlsruhe</region>
<settlement type="city">Mannheim</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="3">
<country xml:lang="fr">Bulgarie</country>
<wicri:regionArea>Bulgarian Academy of Sciences, CLPII, LMD, 25 A, Acad. G. Bonchev St., 1113, Sofia</wicri:regionArea>
<placeName>
<settlement type="city">Sofia</settlement>
<region nuts="2">Sofia-ville (oblast)</region>
</placeName>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Bulgarie</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2000</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">DCFA54D43AAAE82C454D2EBE8F20C9BE1204FD31</idno>
<idno type="DOI">10.1007/3-540-45323-7_23</idno>
<idno type="ChapterID">23</idno>
<idno type="ChapterID">Chap23</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: The paper presents the technology of building a large German-French parallel corpus consisting of of.cial documents of the European Union and Switzerland, and private and public organisations in France and Germany. The texts are morphosyntactically annotated, aligned at the sentence level and marked up in conformance with the TEI guidelines for standardised representation. The multi-level alignment method is applied; its precision is improved due to the correlation with the constraints of the classical alignment method of Gale and Church. The alignment information is encoded externally to the parallel text documents. The process of creating the corpus is an interesting algorithm of applying a number of software tools and adjusting intermediate production results.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
<li>Bulgarie</li>
</country>
<region>
<li>Bade-Wurtemberg</li>
<li>District de Karlsruhe</li>
<li>Sofia-ville (oblast)</li>
</region>
<settlement>
<li>Mannheim</li>
<li>Sofia</li>
</settlement>
</list>
<tree>
<country name="Allemagne">
<region name="Bade-Wurtemberg">
<name sortKey="Slavcheva, Milena" sort="Slavcheva, Milena" uniqKey="Slavcheva M" first="Milena" last="Slavcheva">Milena Slavcheva</name>
</region>
</country>
<country name="Bulgarie">
<region name="Sofia-ville (oblast)">
<name sortKey="Slavcheva, Milena" sort="Slavcheva, Milena" uniqKey="Slavcheva M" first="Milena" last="Slavcheva">Milena Slavcheva</name>
</region>
<name sortKey="Slavcheva, Milena" sort="Slavcheva, Milena" uniqKey="Slavcheva M" first="Milena" last="Slavcheva">Milena Slavcheva</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Ticri/explor/TeiVM2/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000317 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000317 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Ticri
   |area=    TeiVM2
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:DCFA54D43AAAE82C454D2EBE8F20C9BE1204FD31
   |texte=   The Challenge of Parallel Text Processing
}}

Wicri

This area was generated with Dilib version V0.6.31.
Data generation: Mon Oct 30 21:59:18 2017. Site generation: Sun Feb 11 23:16:06 2024